7 research outputs found
Finding the global semantic representation in GAN through Frechet Mean
The ideally disentangled latent space in GAN involves the global
representation of latent space with semantic attribute coordinates. In other
words, considering that this disentangled latent space is a vector space, there
exists the global semantic basis where each basis component describes one
attribute of generated images. In this paper, we propose an unsupervised method
for finding this global semantic basis in the intermediate latent space in
GANs. This semantic basis represents sample-independent meaningful
perturbations that change the same semantic attribute of an image on the entire
latent space. The proposed global basis, called Fr\'echet basis, is derived by
introducing Fr\'echet mean to the local semantic perturbations in a latent
space. Fr\'echet basis is discovered in two stages. First, the global semantic
subspace is discovered by the Fr\'echet mean in the Grassmannian manifold of
the local semantic subspaces. Second, Fr\'echet basis is found by optimizing a
basis of the semantic subspace via the Fr\'echet mean in the Special Orthogonal
Group. Experimental results demonstrate that Fr\'echet basis provides better
semantic factorization and robustness compared to the previous methods.
Moreover, we suggest the basis refinement scheme for the previous methods. The
quantitative experiments show that the refined basis achieves better semantic
factorization while constrained on the same semantic subspace given by the
previous method.Comment: 25 pages, 21 figure
Analyzing the Latent Space of GAN through Local Dimension Estimation
The impressive success of style-based GANs (StyleGANs) in high-fidelity image
synthesis has motivated research to understand the semantic properties of their
latent spaces. In this paper, we approach this problem through a geometric
analysis of latent spaces as a manifold. In particular, we propose a local
dimension estimation algorithm for arbitrary intermediate layers in a
pre-trained GAN model. The estimated local dimension is interpreted as the
number of possible semantic variations from this latent variable. Moreover,
this intrinsic dimension estimation enables unsupervised evaluation of
disentanglement for a latent space. Our proposed metric, called Distortion,
measures an inconsistency of intrinsic tangent space on the learned latent
space. Distortion is purely geometric and does not require any additional
attribute information. Nevertheless, Distortion shows a high correlation with
the global-basis-compatibility and supervised disentanglement score. Our work
is the first step towards selecting the most disentangled latent space among
various latent spaces in a GAN without attribute labels
Minimal Width for Universal Property of Deep RNN
A recurrent neural network (RNN) is a widely used deep-learning network for
dealing with sequential data. Imitating a dynamical system, an infinite-width
RNN can approximate any open dynamical system in a compact domain. In general,
deep networks with bounded widths are more effective than wide networks in
practice; however, the universal approximation theorem for deep narrow
structures has yet to be extensively studied. In this study, we prove the
universality of deep narrow RNNs and show that the upper bound of the minimum
width for universality can be independent of the length of the data.
Specifically, we show that a deep RNN with ReLU activation can approximate any
continuous function or function with the widths and
, respectively, where the target function maps a finite
sequence of vectors in to a finite sequence of vectors in
. We also compute the additional width required if the
activation function is or more. In addition, we prove the universality
of other recurrent networks, such as bidirectional RNNs. Bridging a multi-layer
perceptron and an RNN, our theory and proof technique can be an initial step
toward further research on deep RNNs
Do Not Escape From the Manifold: Discovering the Local Coordinates on the Latent Space of GANs
The discovery of the disentanglement properties of the latent space in GANs
motivated a lot of research to find the semantically meaningful directions on
it. In this paper, we suggest that the disentanglement property is closely
related to the geometry of the latent space. In this regard, we propose an
unsupervised method for finding the semantic-factorizing directions on the
intermediate latent space of GANs based on the local geometry. Intuitively, our
proposed method, called Local Basis, finds the principal variation of the
latent space in the neighborhood of the base latent variable. Experimental
results show that the local principal variation corresponds to the semantic
factorization and traversing along it provides strong robustness to image
traversal. Moreover, we suggest an explanation for the limited success in
finding the global traversal directions in the latent space, especially W-space
of StyleGAN2. We show that W-space is warped globally by comparing the local
geometry, discovered from Local Basis, through the metric on Grassmannian
Manifold. The global warpage implies that the latent space is not well-aligned
globally and therefore the global traversal directions are bound to show
limited success on it.Comment: 23 pages, 19 figure
Disentangling the correlated continuous and discrete generative factors of data
Real-world data typically include discrete generative factors, such as category labels and the existence of objects, as well as continuous generative factors. Continuous generative factors may be dependent on or independent of discrete generative factors. For instance, an intra-class variation of a category is dependent on the discrete generative factor, whereas a common variation of all categories is not. Most previous attempts to integrate discrete generative factors into disentanglement assumed statistical independence between the continuous and discrete variables. In this paper, we propose a Variational Autoencoder(VAE) model capable of disentangling both continuous generative factors. To represent these generative factors, we introduce two sets of continuous latent variables: a private variable and a public variable . The private and public variables represent the intra-class variations and common variations in categories, respectively. Our proposed framework models the private variable as a Gaussian mixture and the public variable as a Gaussian. Each mode of the private variable is responsible for a class of discrete variables. Our proposed model, called Discond-VAE, DISentangles the class-dependent CONtinuous factors from the Discrete factors by introducing private variables. The experiments showed that Discond-VAE could discover private and public factors from the data. Moreover, even under the dataset with only public factors, Discond-VAE does not fail and adapts private variables to represent public factors.(c) 2022 Elsevier Ltd. All rights reserved.N